11 research outputs found

    The Pattern of R2 Retrotransposon Activity in Natural Populations of Drosophila simulans Reflects the Dynamic Nature of the rDNA Locus

    Get PDF
    The pattern and frequency of insertions that enable transposable elements to remain active in a population are poorly understood. The retrotransposable element R2 exclusively inserts into the 28S rRNA genes where it establishes long-term, stable relationships with its animal hosts. Previous studies with laboratory stocks of Drosophila simulans have suggested that control over R2 retrotransposition resides within the rDNA loci. In this report, we sampled 180 rDNA loci of animals collected from two natural populations of D. simulans. The two populations were found to have similar patterns of R2 activity. About half of the rDNA loci supported no or very low levels of R2 transcripts with no evidence of R2 retrotransposition. The remaining half of the rDNA loci had levels of R2 transcripts that varied in a continuous manner over almost a 100-fold range and did support new retrotransposition events. Structural analysis of the rDNA loci in 18 lines that spanned the range of R2 transcript levels in these populations revealed that R2 number and rDNA locus size varied 2-fold; however, R2 activity was not readily correlated with either of these parameters. Instead R2 activity was best correlated with the distribution of elements within the rDNA locus. Loci with no activity had larger contiguous blocks of rDNA units free of R2-insertions. These data suggest a model in which frequent recombination within the rDNA locus continually redistributes R2-inserted units resulting in changing levels of R2 activity within individual loci and persistent R2 activity within the population

    The CanOE Strategy: Integrating Genomic and Metabolic Contexts across Multiple Prokaryote Genomes to Find Candidate Genes for Orphan Enzymes

    Get PDF
    Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates β€œgenomic metabolons”, i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12

    GO4genome: A Prokaryotic Phylogeny Based on Genome Organization

    Get PDF
    Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replacement of genes via horizontal gene transfer may give misleading results. GO4genome is a new paradigm, which is based on a detailed analysis of gene function and the location of the respective genes. For characterization of genes, the algorithm uses gene ontology enabling a comparison of function independent of evolutionary relationship. After the identification of locally optimal series of gene functions, their length distribution is utilized to compute a phylogenetic distance. The outcome is a classification of genomes based on metabolic capabilities and their organization. Thus, the impact of effects on genome organization that are not covered by methods of molecular phylogeny can be studied. Genomes of strains belonging to Escherichia coli, Shigella, Streptococcus, Methanosarcina, and Yersinia were analyzed. Differences from the findings of classical methods are discussed

    A Computational Study of Elongation Factor G (EFG) Duplicated Genes: Diverged Nature Underlying the Innovation on the Same Structural Template

    Get PDF
    BACKGROUND: Elongation factor G (EFG) is a core translational protein that catalyzes the elongation and recycling phases of translation. A more complex picture of EFG's evolution and function than previously accepted is emerging from analyzes of heterogeneous EFG family members. Whereas the gene duplication is postulated to be a prominent factor creating functional novelty, the striking divergence between EFG paralogs can be interpreted in terms of innovation in gene function. METHODOLOGY/PRINCIPAL FINDINGS: We present a computational study of the EFG protein family to cover the role of gene duplication in the evolution of protein function. Using phylogenetic methods, genome context conservation and insertion/deletion (indel) analysis we demonstrate that the EFG gene copies form four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. These ancient gene families differ by their indispensability, degree of divergence and number of indels. We show the distribution of EFG subfamilies and describe evidences for lateral gene transfer and recent duplications. Extended studies of the EFG II subfamily concern its diverged nature. Remarkably, EFG II appears to be a widely distributed and a much-diversified subfamily whose subdivisions correlate with phylum or class borders. The EFG II subfamily specific characteristics are low conservation of the GTPase domain, domains II and III; absence of the trGTPase specific G2 consensus motif "RGITI"; and twelve conserved positions common to the whole subfamily. The EFG II specific functional changes could be related to changes in the properties of nucleotide binding and hydrolysis and strengthened ionic interactions between EFG II and the ribosome, particularly between parts of the decoding site and loop I of domain IV. CONCLUSIONS/SIGNIFICANCE: Our work, for the first time, comprehensively identifies and describes EFG subfamilies and improves our understanding of the function and evolution of EFG duplicated genes
    corecore